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1  Introduction 


A  primary  contribution  of  theoretical  computer  science  has  been  the  identification  of  the  so-called 
NP-complete  problems,  a  well-known  class  of  problems  provably  equivalent  to  one  another  in  worst- 
case  computational  complexity,  modulo  polynomial-time  computation.  These  problems,  being  the 
^■"<lhardestwTnThe  class  NP,  are  widely  believed  to  be  unsolvable  by  any  polynomial-time  algorithm, 
and  indeed,  no  sub-exponential  time  algorithm  is  known  for  any  NP-complete  problem. 

Nevertheless,  at  least  since  the  late  1970’s,  algorithms  have  been  known  that  can  solve  as¬ 
sorted  NP-complete  problems  in  polynomial-time  in  the  average  case ,  i.e.,  whose  expected  running 
time  on  an  instance  chosen  randomly  (according  to  some  “natural”  distribution)  is  bounded  by 
a  polynomial.  Johnson  [13]  surveys  a  number  of  these  results,  including,  for  instance,  expected 
polynomial-time  algorithms  for  finding  Hamiltonian  circuits  in  random  graphs,  and  for  3-coloring 
random  graphs.  Typically,  such  algorithms  are  based  on  the  observation  that  almost  all  random 
instances  have  some  easily  observed  property  that  makes  the  decision  problem  trivial;  the  remain¬ 
ing  few  instances  can  then  be  solved  by  an  exponential-time,  brute-force  algorithm.  For  example, 
almost  all  random  graphs  contain  4-cliques  which  make  them  trivially  non- 3- colorable;  in  the  ex¬ 
tremely  unlikely  event  that  the  randomly  chosen  graph  does  not  contain  a  4-clique,  a  brute-force 
strategy  can  be  used  to  determine  if  the  graph  is  3-colorable. 

Given  such  results,  one  may  naturally  wonder  whether  there  exist  any  algorithms  that  are 
“hard”  on  average,  and  if  so,  how  one  might  go  about  identifying  such  problems  and  proving  their 
hardness.  One  approach,  first  suggested  by  Levin,  is  to  follow  the  strategy  set  forth  in  the  theory 
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of  worst-case  complexity  of  proving  completeness  for  a  class  of  problems  using  some  appropriate 
notion  of  reducibility.  Levin  [14,  15]  introduced  his  notion  of  average-case  completeness,  analogous 
to  the  usual  worst-case  completeness,  in  1984.  In  his  setting,  problems  consist  of  two  parts:  a 
decision  problem,  and  a  distribution  on  instances.  The  class  DistNP  consists  of  those  problems 
whose  decision  problem  is  in  NP,  and  whose  distribution  is  computable  in  polynomial  time  (more 
details  in  later  sections).  Levin  shows  that  a  tiling  problem,  under  an  “almost”  uniform  distribution 
on  the  instances,  is  complete  for  DistNP.  Thus,  if  this  tiling  problem  is  computable  in  polynomial¬ 
time  on  average,  then  so  is  every  problem  in  the  class  DistNP  —  seemingly  strong  evidence  that 
the  problem  is  hard  on  average. 

Levin’s  original  paper  was  virtually  incomprehensible  in  its  terseness,  recommended  by  John¬ 
son  [13]  only  for  “cryptoanalytically  inclined  readers.”  Fortunately,  Gurevich  [9],  Gurevich  and 
McCauley  [10]  and  Goldreich  [5]  have  since  provided  the  community  the  valuable  service  of  deci¬ 
phering  and  explaining  Levin’s  one-and-a-half  page  note  in  expositions  that  far  exceed  Levin’s  both 
in  length  and  clarity. 

Gurevich  [6,  9,  8]  also  managed  to  prove  the  completeness  for  DistNP  of  a  few  other  moderately 
natural  problems,  and  Venkatesan  and  Levin  [17]  were  later  able  to  find  a  complete  graph  coloring 
problem.  Nevertheless,  in  general,  there  has  been  a  great  dearth  of  such  results,  sharply  con¬ 
trasting  with  the  hundreds  of  natural  problems  known  to  be  NP-complete  [4];  apparently,  proving 
completeness  for  DistNP  is  much  harder  than  for  NP. 

In  the  meantime,  some  theoretical  aspects  of  average-case  complexity,  such  as  the  relationship  of 
DistNP  to  other  complexity  classes,  have  been  studied  by  Gurevich  [6,  9]  and  Ben-David  et  al.  [2]. 
Some  of  these  will  be  described  in  later  sections. 

In  this  paper,  I  will  review  the  development  of  the  theory  of  average-case  completeness  outlined 
above.  Where  possible,  I  have  also  tried  to  make  contributions  to  this  theory.  Among  these 
contributions  is  an  alternative  characterization  of  “polynomial  on  average”  that  seems  to  simplify 
some  of  the  proofs  found  in  the  literature,  and  that  perhaps  is  more  intuitive  than  the  “standard” 
definition  proposed  by  Levin  to  which  it  is  equivalent.  I  also  introduce  in  Section  4  a  new  and 
more  liberal  notion  of  “easy  on  average”  that  may  be  more  appropriate  in  some  settings.  Finally, 
in  Section  5, 1  have  organized  and  contributed  to  what  is  known  about  the  relationships  among  the 
various  new  average-case  complexity  classes. 

2  A  model  for  studying  average-case  complexity 

The  notation  and  terminology  presented  in  this  section  are  adopted  for  the  most  part  from  Gol¬ 
dreich  [5].  We  will  assume  for  simplicity  that  E  =  {0,1}  is  our  input  alphabet,  and  that  E*,  the 
set  of  finitely  long  strings  over  E,  is  ordered  in  the  usual  '  cographic  order:  0,1,00,01,....  (To 
avoid  irritating  difficulties  at  a  later  point,  the  empty  string  is  emitted.)  We  write  x  <  y  if  z  comes 
before  y  in  this  ordering,  and  we  denote  by  |x|  the  length  o;  in  symbols. 

We  begin  with  a  discussion  of  distributions.  Naturally,  the  average-case  behavior  of  a  program 
is  dependent  upon  the  distribution  against  which  the  “average”  is  being  taken.  A  density  function 
is  a  real-valued  function  p!  :  E*  — *■  [0, 1]  mapping  strings  to  values  between  0  and  1.  and  for  which 
Lier*  v'(x)  =  1.  Thus,  n'(x)  can  be  interpreted  as  the  probability  that  x  is  chosen.  The  associated 
distribution  function  :  E*  — ►  [0, 1]  is  defined  by 

/*(*)  = 

y<z 

Clearly,  fi  is  nondecreasing  and  approaches  1  asymptotically. 
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A  distributional  problem  is  a  pair  (D,p)  where  D  :  E* 
is  a  distribution  function. 


{0,1}  is  a  Boolean  predicate,  and  p 


Defining  easy  on  average 

As  a  first  step  to  developing  a  theory  of  average-case  complexity,  we  will  need  a  set  of  careful 
definitions  that  express  appropriately  what  is  meant  intuitively  by  a  problem  that  is  easy  or  hard 
on  average. 

We  need  first  a  notion  of  what  it  means  for  a  function  /  :  E*  — ►  R+  to  grow  “polynomially  on 
average.”  It  turns  out  that  the  most  natural  and  intuitive  definition  of  such  a  notion  suffers  serious 
deficiencies.  In  particular,  such  a  definition  might  require  that 

£  p'n(x)-f(x)<0(nk)  (1) 

|zl=n 

for  all  n  and  some  constant  k,  where  p'n(x)  =  p'(a:)/ £|x|=n /A*)-  Thus,  this  definition  requires 
that  the  expected  value  of  /  over  inputs  of  length  n  be  bounded  by  a  polynomial  in  n. 

Goldreich  [5]  and  Gurevich  [7]  give  several  arguments  why  this  is  not  the  “right”  definition. 
Briefly,  these  difficulties  arise  from  the  fact  that  the  definition  is  not  closed  under  composition  with 
a  polynomial.  As  a  result,  the  definition  is  not  machine-independent  —  i.e.,  an  algorithm  running 
in  polynomial  time  on  average  (under  this  naive  definition)  on  one  Turing  machine  may  no  longer 
have  this  quality  if  the  machine  model  is  altered  slightly.  The  definition  is  also  dependent  on  the 
manner  in  which  the  instances  are  encoded;  for  instance,  Goldreich  gives  an  example  of  a  graph 
algorithm  that  is  fast  when  the  input  graph  is  encoded  by  its  incidence  matrix,  but  is  slow  when 
the  graph  is  encoded  by  an  adjacency  list. 

Levin  [15]  introduces  a  definition  of  polynomial  on  average  that,  though  less  intuitive  in  appear¬ 
ance,  succeeds  in  overcoming  these  shortcomings.  Namely,  a  function  /  :  E*  -*■  R+  is  polynomial 
on  average  with  respect  to  distribution  p  if  there  exists  some  constant  6  >  0  such  that 


Here  I  propose  an  alternative,  equivalent  formulation  of  polynomial  on  average  that  may  be 
more  intuitively  appealing,  and  that  will  be  useful  in  proving  some  of  the  results  that  follow. 
This  formulation  also  generalizes  more  smoothly  to  other  notions,  such  as  logarithmic  on  average, 
considered  by  Ben-David  et  al.  [2j. 

A  function  /  :  E*  — *•  R+  is  usually  bounded  by  a  function  p  :  N  x  R+  — ►  R+  with  respect  to 
distribution  p  if,  for  all  c  >  0, 

Pr#j  [/(*)  >  MMAA)]  < 

where  the  probability  is  computed  over  x  chosen  randomly  according  to  p.  Thus  p(  ■ ,  1/e)  bounds 
/  for  all  but  e  of  the  instances. 

When  p  is  restricted  to  be  a  polynomial,  we  obtain  Levin’s  notion  of  polynomial  on  average. 

Lemma  1  Let  /  :  E*  — ►  R+,  and  let  p  be  a  distribution  function.  Then  f  is  polynomial  on  average 
if  and  only  if  f  is  usually  bounded  by  a  polynomial  (with  respect  to  p). 

Proof:  Let  S  >  0  witness  that  /  is  polynomial  on  average.  Then  the  expected  value  of  f(x)s /\x\ 
is  bounded  by  some  number  N .  By  Markov’s  inequality,  it  follows  that,  for  t  >  0, 


N 

>  —  <e, 
c 


!  Availability  Codas 
i&y '  \  ivali  aod/or 

/  Dlst  Spooled 


□  □ 


or  equivalently, 


r 


P'm 


/(*)> 


<  e. 


That  is,  /  is  usually  bounded  by  the  polynomial  (iVjarj/e)1/6. 

Conversely,  suppose  without  loss  of  generality  that  /  is  usually  bounded  by  the  polynomial 
(fclzl/v/f)*,  for  some  constant  k  >  0.  It  follows  that,  for  e  >  0, 


and  so,  for  t  >  0, 

Thus, 


Pr, 


\f(x)xlk 
1*1 


<  e, 


Pr, 


f{x)l/k 


j(x)l/k 


oo 

£p'« 

[/(*)■"  1 
Ixl 

t-0 

L  1  1  J 

oo 

1  +  k2  ■ 

<  oo, 

t=i 

< 

(=1 


,  /(*)  V 

t  -  1  <  y—  <  t 

1*1 


and  therefore  l/k  witnesses  that  /  is  polynomial  on  average.  ■ 

It  is  now  easy  to  see  why  this  definition  of  polynomial  on  average  is  closed  under  composition 
with  a  polynomial:  if  /  is  usually  bounded  by  polynomial  p,  then  clearly  fc  is  usually  bounded  by 
polynomial  pc,  for  any  constant  c  >  0. 

Also,  it  is  not  hard  to  show  that  the  “naive”  definition  of  polynomial  on  average  implies  the 
correct  definition.  For  suppose  /  satisfies  Equation  (1)  so  that,  for  some  k  >  0, 

E„„  [/(*)]  < 


Then  by  Markov’s  inequality,  for  c  >  0, 

PlVn  [/(*)  >  knk/f\  <  €. 

This  implies  that 

PiV  [/(*)  >  k\x\k/e]  <  e 

and  so  /  is  usually  bounded  by  the  polynomial  k\x\k /t. 

Thus,  an  algorithm  A  that  runs  in  time  polynomial  on  average  can  be  thought  of  as  follows: 
given  <  >  0  and  a  randomly  chosen  instance  x,  A  halts  in  time  polynomial  in  |x|  and  1  /c  with 
probability  exceeding  1-6.  Note  again  that  this  probability  is  over  the  random  choice  of  x,  and 
not  over  any  kind  of  randomization  of  A.  (In  fact,  we  will  usually  only  consider  deterministic 
algorithms.) 

We  say  that  a  distributional  problem  (D,p)  is  polynomial  time  on  average  if  there  exists  a 
Turiqg  machine  that  decides  D  whose  running  time  is  polynomial  on  average  with  respect  to  p. 
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Reducibility 

We  will  next  require  a  notion  of  reducibility.  Such  a  notion  should  have  the  property  that  if  ( Dj ,  /jj ) 
is  reducable  to  (D2,p2),  and  if  (D2,p2)  is  polynomial  on  average,  then  so  is  (Di,pi). 

More  formally,  we  say  that  a  function  /  :  E*  — ►  £*  reduces  distributional  problem  ( D\,p.\ )  to 
{D2,H2)  if 

1.  /  is  computable  in  time  polynomial  on  average  (with  respect  to  pi); 

2.  for  all  x  G  S’,  D\(x)  =  D2(f(x)); 

3.  for  some  constant  c  >  0, 

m'2(x)  >  j7fz  '  m'i (»)• 

11  *€/->(*) 

The  first  two  conditions  on  /  are  straightforward  —  the  first  requires  that  /  be  efficiently 
computable  (on  average),  and  the  second  requires  that  /  be  valid  in  the  sense  that  true  instances  of 
D\  are  mapped  only  to  true  instances  of  D2.  The  third  condition  is  something  new:  here  we  require 
that  common  instances  of  Di  not  be  mapped  to  rare  instances  of  D2,  and  that  the  distribution 
induced  on  D2  by  pi  and  /  not  be  “too  far  off”  from  p2. 

The  term  “domination”  is  used  to  refer  to  this  relationship  between  distributions.  Thus,  distri¬ 
bution  fi2  dominates  p\  if  there  exists  a  constant  c  >  0  such  that  p2(x)  >  |£|“cpi(x)  for  all  x  £  £*. 
Thus,  the  last  condition  of  the  above  reducibility  definition  states  that  p2  dominates  the  induced 
distribution  p\j  defined  by 

Mi/(x)  =  m!  (y). 

yej-'lx) 

Finally,  we  are  ready  to  prove  the  following  theorem  which  justifies  the  preceding  definitions: 

Theorem  1  Let  f  reduce  (Dx,pi)  to  {D2,p2),  and  suppose  that  (D2,p2)  is  polynomial  on  average. 
Then  so  is  (Dx,p  1). 

This  theorem  is  presented  in  detail  by  Goldreich  [5]  and  is  re-proved  here  as  an  exercise  in  the 
characterization  of  polynomial  on  average  provided  by  Lemma  1. 

Following  Goldreich,  we  break  the  proof  into  two  steps: 

Step  1  Let  pi f  be  the  distribution  on  instances  of  D2  induced  by  p\  and  f ,  and  suppose  that 
(Z)2, Mi /)  15  polynomial  time  on  average.  Then  so  is  (D i,Mi)- 

Proof:  By  Lemma  1,  there  exists  an  algorithm  B  solving  {D2,p2)  in  time  tg(x),  a  function  usually 
bounded  by  some  polynomial  ps(|:r|,l/e)  with  respect  to  p.\j.  Likewise,  /  is  computable  in  time 
tj(x)  which  is  usually  bounded  by  some  polynomial  p/( |z|,l/t)  with  respect  to  p\.  An  instance 
x  is  computed  in  the  obvious  manner  as  A(x)  =  B(f(x))  in  time  t^(x)  =  tf{x)  +  tB(/(x)).  Then 
tA{x )  is  usually  bounded  by  the  polynomial  p^(|a;|,l/f)  =  pj{\x\,  2/e)  +  pB(p/(lxl,2/e),  2/e).  This 
can  be  seen  as  follows:  given  e  >  0, 

Pf/n  Mx)  >  P/t(ixl.  1  A)1  <  PrMi  (Mx)  >  Pf( lxl>2A)]  v  M/(x))  >  Pa(P/( |x|,2/f),2/e)]] 

=  (*/(x)  >  P/(lxK2/e)] 

+  Pr,i  [[</(*)  <  P/(lxl>  2/c)]  A  M/(z))  >  Pb(P/(IxI,2/c),  2/e))) 

<  |  +  Prw(tfl(/(x))>pB(|/(x)|,2/c)] 

=  \  Mx)  >  PB( lxl,  2/e)] 

<  e 
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where  the  second  inequality  follows  from  the  fact  that  |/(x)|  <  t/(x).  Thus,  t/ 1  is  usually  bounded 
by  a  polynomial  as  claimed  and  (D\,p\)  is  polynomial  time  on  average.  ■ 

Step  2  Suppose  P2  dominates  p\  and  that  (D,p 2)  is  polynomial  on  average.  Then  so  is  (D,p\). 


Proof:  Let  c  >  0  witness  that  P2  dominates  p\,  and  suppose  A  solves  D  in  time  tj\(x)  which  is  usu¬ 
ally  bounded  by  Pa(\x\,  1/c)  with  respect  to/x2-  Then  ^(1)  is  usually  bounded  by  p,i(|x|,2|a:|c+2/e) 
with  respect  to  p\\  given  e  >  0, 


OO 

Pr^  [^(x)  >  Pa(M,2|i|c+2/<0]  =  [[1*1  =  n)  A  M1')  >  Pvt(n,2nc+?/c)]] 

n=l 

00 

<  PlW  [**(*)  >  Pa( \xl  2nc+2/0] 

<  b* 


<  e. 


n=l 


Together,  Steps  1  and  2  clearly  imply  Theorem  1. 


3  Average-case  completeness 

Using  this  notion  of  reducibility,  Levin  was  able  to  show  that  there  exists  a  problem  complete  for 
a  whole  class  of  problems,  i.e.,  a  problem  to  which  every  other  problem  in  some  class  is  reducible. 
Thus,  he  succeeded  in  identifying  a  “hardest”  problem  in  some  class  which  therefore  can  only  be 
polynomial  on  average  if  every  other  problem  in  the  class  is  as  well. 

To  prove  his  main  theorem,  it  was  necessary  for  Levin  to  make  some  “niceness”  assumptions 
about  the  distributions  he  was  working  with,  namely,  that  they  be  polynomially  computable.  Specif¬ 
ically,  a  distribution  p  is  polynomially  computable  if  there  exists  a  polynomial-time  Turing  machine 
that,  on  input  x,  computes  p(x)  as  a  binary  rational  number.  Note  that  the  Turing  machine  must 
compute  p(x),  the  probability  of  choosing  any  string  y  <  x.  This  is  a  stronger  condition  than  the 
requirement  that  the  density  p'(x),  the  probability  of  choosing  *,  be  computable.  Goldreich  [5] 
shows  that  this  is  a  strictly  stronger  condition  if  #P. 

(Strictly  speaking,  some  of  the  distributions  described  in  this  paper  take  on  irrational  values. 
However,  all  of  these  distributions  can  be  accomodated  by  relaxing  this  definition  to  require  only 
that  the  function  cp  be  polynomially  computable  for  some  constant  c  >  0.  This  relaxation  does 
not  detract  from  any  of  the  results  described  in  this  paper.) 

We  are  now  ready  to  introduce  the  class  of  distributional  NP  problems,  or  DistNP.  A  distribu¬ 
tional  problem  ( D,p )  is  in  DistNP  if  D  is  in  NP,  and  p  is  polynomially  computable. 

Note  that,  even  to  show  that  a  problem  in  DistNP  requires  more  than  polynomial  time  in  the 
worst  case  (let  alone  on  average)  is  to  show  that  P  /  NP.  Thus,  such  a  result  seems  unlikely. 
Nevertheless,  it  is  possible  to  find  a  complete  problem  for  DistNP.  A  distributional  problem  ( D,p ) 
is  complete  for  DistNP  if  ( D,p )  is  in  DistNP,  and  every  other  problem  in  DistNP  can  be  reduced 
to  ( D,p ).  Thus,  if  ( D,p )  is  polynomial  time  on  average,  then  so  is  every  problem  in  DistNP. 

A  problem  complete  for  DistNP 

Levin  [15]  showed  that  a  tiling  problem  under  a  near  uniform  distribution  is  complete  for  DistNP. 
On  close  analysis  of  his  proof,  Goldreich  (5]  and  Gurevich  [9]  found  that  Levin’s  proof  could  be 
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simplified  by  first  showing  that  a  generic  bounded  halting  problem  is  complete,  a  problem  that  can 
then  be  reduced  to  tiling. 

In  particular,  the  Bounded  Halting  Problem  is  the  following: 

Instance:  An  encoding  M  of  a  nondeterministic  Turing  machine,  a  word  x ,  and  a  number  i  in 
unary. 

Question:  Does  the  machine  encoded  by  M  accept  x  within  t  steps? 

Distribution:  The  values  of  t,  \M\  and  |x|  are  chosen  first  with  probability  proportional  to  an 
inverse  quadratic.  Then  M  and  x  axe  chosen  uniformly  from  all  strings  of  the  given  length. 
Thus,  p\x)  cc  \M\~2  ■  2“ImI  •  |xj"2  •  2-1*1  •  t~2. 

Levin’s  main  result  (as  interpreted  by  Goldreich)  is  the  following: 

Theorem  2  The  Bounded  Halting  Problem  is  complete  for  DistNP. 

Proof:  Goldreich  [5]  gives  a  clear  and  careful  proof  of  this  theorem.  Here  I  only  try  to  distill  some 
of  the  main  ideas.  Let  ( BH,pbh )  denote  the  Bounded  Halting  Problem  when  decomposed  into  the 
associated  predicate  BH  and  distribution  pbh-  That  (BH,pbh)  is  in  DistNP  is  easily  verified. 

Let  (D,p)  be  any  distributional  problem  in  DistNP.  We  wish  to  reduce  (D,p)  to  ( BH,pbh )• 
We  know  that  D  is  accepted  by  some  nondeterministic  machine  M  with  running  time  bounded  by 
some  polynomial  p.  The  usual  (worst-case)  reduction  of  D  to  BH  would  map  an  instance  x  of  D 
to  instance  (M,x,p( |x|))  of  BH.  The  problem  with  this  reduction  is  that  it  fails  the  domination 
condition:  an  extremely  common  instance  with  probability,  say,  |x|~2  gets  mapped  to  a  far  rarer 
instance  with  probability  proportional  to  |x|~2  •  2-1*1. 

The  main  insight  needed  to  overcome  this  difficulty  is  the  following:  We  would  like  to  map  every 
instance  x  to  the  shortest  string  possible  since  pbh  assigns  higher  probability  to  shorter  instances. 
Moreover,  if  x  is  a  very  common  instance  (so  that  p'(x )  is  large),  then  x  can  be  more  compactly 
represented  using  the  (polynomially  computable)  function  p. 

In  particular,  x  can  be  encoded  by  any  fraction  in  the  interval  (p(x  -  l),/z(x)j,  where  x  -  1  is 
the  predecessor  of  x.  Furthermore,  such  an  encoding  can  be  efficiently  and  uniquely  decoded  using 
a  kind  of  binary  search  since  p  is  polynomially  computable.  Finally,  note  that  there  always  exists 
a  fraction  <*M(x)  in  this  interval  whose  binary  expansion  is  of  length  lg(l /p'(x))  +  0(1). 

Thus,  any  string  x  can  be  efficiently  encoded  by  CM(x),  the  shorter  of  x  itself  and  aM(x).  Note 
that  the  density  on  strings  induced  by  this  compression  scheme  is  very  flat  —  every  string  has 
density  0(2~l*l).  Note  also  that  there  exists  a  Turing  machine  that,  given  a  compressed  string 
CM(x),  first  decodes  x,  and  then  (nondeterministically)  simulates  M  on  x  to  decide  D{x)  in  time 
bounded  by  some  polynomial  p„(|x|). 

The  rest  of  the  reduction  is  straightforward:  an  instance  x  of  ( D,p )  is  mapped  to 
(m,j,C^(x),  lpe(*)\.  It  can  be  checked  that  this  reduction  now  satisfies  the  domination  condition. 


Other  complete  problems 

So  Bounded  Halting  is  a  canonical  problem  complete  for  DistNP.  With  this  proved,  it  is  possible  to 
prove  the  completeness  of  a  handful  of  other  problems  to  which  Bounded  Halting  can  be  reduced. 
For  example,  a  straightforward  reduction  shows  that  the  following  variant  of  the  tiling  problem  is 
complete: 
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Instance :  A  set  of  “legal”  tiles  L  C  H4,  each  labeled  in  the  corners  with  one  of  the  twenty-six 
letters  of  the  Roman  alphabet  7 Z;  a  number  t  in  unary;  and  a  legal  string  a  of  tiles  from  L  of 
length  at  most  t. 

Question:  Can  o  be  extended  to  a  tiling  of  a  t  x  t  square  using  tiles  only  from  T? 

Distribution:  L  is  chosen  uniformly  at  random  from  TZ4,  t  is  chosen  with  probability  proportional 
to  t-2,  |<rj  is  chosen  uniformly  from  {1,..  and  o  is  chosen  uniformly  from  all  legal  strings 
of  this  length. 

In  a  standard  reduction,  an  instance  (M,x,  1*)  is  mapped  to  (Z-o,cr,  l‘)  where  Lq  encodes  the 
legal  computations  of  a  universal  Turing  machine,  and  a  encodes  (M,  a:).  Since  Lq  has  some  constant 
probability  of  being  chosen  under  the  above  distribution,  the  domination  condition  is  satisfied.  This 
is  exactly  why  such  a  reduction  succeeds  in  this  case,  but  is  bound  to  fail  in  others. 

For  example,  in  the  standard  proof  of  the  NP~ completeness  of  satisfiability  of  CNF  formulas, 
the  computations  of  a  Turing  machine  are  encoded  not  in  one  place  (such  as  the  set  of  legal  tiles 
in  the  tiling  problem),  but  rather  it  is  encoded  again  and  again  throughout  the  formula.  More 
specifically,  if  represents  the  j-th  bit  of  an  instantaneous  description  of  the  encoded  machine  M 
on  the  1-th  step,  then  x,j  is  some  function  of  some  other  variables  which  can  be  encoded  by 

a  constant  length  formula.  However,  whatever  this  formula  is,  it  must  be  repeated  for  each  variable 
x,j,  and  the  chance  of  such  a  repetition  of  this  pattern  occurring  in  a  random  formula  becomes 
exponentially  small.  This  appears  to  be  the  primary  reason  why  it  has  proved  so  difficult  to  show 
the  completeness  of  other  more  natural  problems. 

Nevertheless,  Venkatesan  and  Levin  [17]  did  manage  to  come  up  with  a  graph  coloring  problem 
that  is  hard  on  average.  Their  result  is  interesting  and  surprising  because  graph  problems  have 
until  now  proved  to  be  an  excellent  source  of  NP-compIete  problems  that  are  easy  on  average.  Their 
technique  is  also  of  interest:  in  essence,  they  prove  their  hardness  result  using  the  very  methods 
used  in  the  past  to  prove  the  easiness  of  other  random  graph  problems. 

Here  is  a  description  of  the  problem  they  consider:  Let  G  be  a  directed  graph,  each  of  whose 
edges  has  been  assigned  one  of  the  four  colors  blank,  black,  red  or  green.  A  spot  is  an  induced 
three-node,  unlabeled  subgraph  of  G.  The  coloration  of  G,  denoted  C(G),  is  the  set  of  all  spots  of 
G. 

Their  random  graph  coloring  problem  can  be  stated  as  follows: 

Instance:  A  directed  (uncolored)  graph  G,  a  coloration  C,  and  a  number  k. 

Question:  Can  the  edges  of  G  be  colored  so  that  C(G)  =  C,  and  so  that  the  number  of  blank  edges 
is  exactly  k 1 

Distribution:  C  is  chosen  uniformly,  |G|  is  chosen  with  probability  proportional  to  |G|-2,  k  is 
chosen  uniformly  from  {1, . . .  ,|G|},  and  G  is  chosen  uniformly  from  all  graphs  of  size  |G| 
(i.e.,  each  edge  is  present  with  probability  1/2). 

Venkatesan  and  Levin’s  main  result  is  a  proof  that  this  problem  is  complete  for  DistNP.  They 
prove  this  by  a  randomized  reduction  from  Tiling  (or  from  Bounded  Halting).  That  is,  an  instance 
of  Tiling  is  mapped  by  a  randomized  function  /  to  one  of  a  number  of  possible  instances  of  the 
graph  coloring  problem. 

Here  I  sketch  some  of  the  high-level  ideas  of  their  reduction,  which  I  find  easier  to  think  about  as 
a  direct  reduction  from  Bounded  Halting  rather  than  Tiling.  Let  ( M,x,V )  be  a  Bounded  Halting 
Problem  instance.  Such  an  instance  is  mapped  to  a  graph  G  on  0(t2)  vertices.  This  graph  is 
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random,  except  for  the  requirement  that  it  have  a  number  of  features.  The  most  important  of  these 
is  an  embedded  t  X  t  grid  of  t2  vertices;  that  is,  each  vertex  of  this  grid  is  connected  to 
and  v,(j+1y  This  grid  is  where  the  computation  of  a  universal  Turing  machine  is  simulated:  the 
coloring  of  the  grid  encodes  the  time-space  history  of  the  Turing  machine  in  the  usual  manner. 

The  graph  G  has  a  number  of  other  features  that  together  with  the  chosen  coloration  Co,  ensure 
the  coloring  of  this  grid  is  in  conformity  with  the  computation  of  a  universal  Turing  machine  on 
(M,x),  and  thus  that  the  graph  be  colorable  if  and  only  if  (M,x)  is  accepted.  This  part  of  the 
reduction  falls  into  the  standard  paradigm  used  in  (worst-case)  reductions  of  building  “gadgets” 
to  force  a  particular  behavior.  What  is  new  is  their  construction  of  a  graph  with  features  that 
are  likely  to  be  contained  by  a  large  fraction  of  all  graphs.  That  is,  they  show  that  an  (entirely) 
random  graph  will  have  all  of  the  required  features  with  probability  at  least  l/nc  for  some  constant 
c  >  0,  and  thus  they  are  able  to  show  that  their  reduction  satisfies  the  domination  condition. 

An  incompleteness  result 

As  mentioned  above,  Venkatesan  and  Levin’s  reduction  is  randomized.  It  is  not  hard  to  modify 
Theorem  1  to  show  that,  if  /  is  a  randomized  function  reducing  to  and  (D 2,1*2) 

is  solved  in  polynomial  time  on  average  by  a  randomized  Turing  machine,  then  so  is  (Z?i,pi). 

In  fact,  it  turns  out  that  the  distributional  graph  coloring  problem  described  above  cannot  be 
proved  complete  if  only  deterministic  reductions  are  allowed:  an  interesting  result  of  Gurevich  [6,  9] 
shows  that  if  p  is  “too  close”  to  being  uniform,  then  the  distributional  problem  (D,p)  cannot  be 
complete  for  DistNP.  I  close  this  section  with  a  description  of  this  intriguing  result. 

A  distribution  p  is  said  to  be  flat  if  there  exists  a  constant  6  >  0  such  that  for  all  x  €  £*, 
p'(i)  <  2~lxl  .  Thus,  each  instance  has  very  low  density,  i'ote  that  pbh  described  above  is  not 
flat  since,  by  fixing  M  and  x,  and  allowing  t  to  grow,  we  can  find  strings  of  density  proportional 
to  |(M, £,1*)|~2.  On  the  other  hand,  the  distribution  on  Venkatesan  and  Levin’s  graph  coloring 
problem  is  flat. 

Below,  Exp  (NExp)  is  the  set  of  decision  problems  accepted  by  deterministic  (nondeterministic) 
Turing  machines  in  exponential  (i.e.,  2n°(°)  time.  The  proof  of  Gurevich’s  theorem  is  omitted. 

Theorem  3  Let  (D,p)  (E  DistNP,  and  suppose  p  is  flat.  Then  (D,p)  is  not  complete  for  DistNP, 
unless  Exp  =  NExp. 

4  Easier  than  easy  on  average 

In  this  section,  I  propose  a  natural  liberalization  of  the  notion  of  easy  on  average  that  seems  to 
have  been  overlooked  in  the  past. 

The  standard  notion  of  easy  on  average  described  in  Section  2  requires  that  there  exist  an 
algorithm  for  solving  the  decision  problem  that  is  always  correct  —  that  is,  the  algorithm  must 
find  a  certificate  that  justifies  its  answer.  Thus,  for  example,  it  is  not  enough  in  deciding  graph 
3-colorability  to  observe  that  most  graphs  are  not  3-colorable  —  an  algorithm  must  certify  that  the 
given  graph  is  not  3-colorable,  for  instance,  by  finding  a  4-clique. 

In  some  applications,  this  requirement  may  be  too  strong.  For  example,  in  designing  a  pseudo¬ 
random  bit  generator,  one  would  like  to  say  that  an  adversary  is  unlikely  to  guess  the  next  generated 
bit  by  any  means.  It  is  irrelevant  in  such  a  setting  whether  the  adversary  has  a  certificate  of  the 
value  of  the  bit  —  only  that  he  can  make  a  reasonable  guess. 

The  formalism  for  such  a  liberalization  is  motivated  by  the  characterization  of  polynomial  on 
average  given  by  Lemma  1.  Recall  that  this  definition  states  that  a  Turing  machine  M  solves  a 


distributional  problem  (£>,p)  in  polynomial  time  on  average  if  there  exists  a  polynomial  p  such 
that,  for  all  f  >  0, 

Pr >  p(z,  1  A)]  <  ( 

where  t\f(x)  is  M’s  running  time  on  input  x.  Note  that  if  M’s  computation  is  cut  off  after  p(x,  1/e) 
steps  (and  M  is  forced  to  output  a  default  value  if  it  is  not  yet  finished),  then  the  probability  that 
a  correct  answer  is  output  exceeds  1  -  e. 

This  suggests  the  following  definition:  A  Turing  machine  M  solves  distributional  problem  (D,p) 
approximately  in  polynomial  time  if,  for  all  e  >  0, 

PrM[M(x,e)  ±  D(x)]  <  e 

where  the  probability  is  over  random  choices  of  x  (according  to  p).  Furthermore,  M’s  running  time 
must  be  polynomial  in  Jar |  and  1/c. 

Note  that  Theorem  1  can  easily  be  modified  to  handle  reductions  in  which  /  is  only  approxi¬ 
mately  computable.  Then  if  /  reduces  (D\,p\)  to  (D2,pi),  and  (£2,^2)  is  approximately  solvable 
in  polynomial  time,  then  so  is  (Di,pj).  In  particular,  this  shows  that  the  Bounded  Halting  Prob¬ 
lem,  as  well  as  the  other  problems  described  in  Section  3,  are  complete  under  such  approximate 
reductions.  Thus,  if  the  Bounded  Halting  Problem  is  approximately  solvable  in  polynomial  time, 
then  so  is  every  other  problem  in  DistNP. 

Let  AverP  denote  the  class  of  distributional  problems  (D,p)  for  which  p  is  polynomially  com¬ 
putable  and  which  are  solvable  in  polynomial  time  on  average.  (This  definition  differs  slightly  from 
those  given  by  Goldreich  [5]  and  Ben-David  et  al.  [2])  Let  ApproxP  be  the  class  of  distributional 
problems  ( D,p )  for  which  p  is  polynomially  computable,  and  which  are  approximately  solvable  in 
polynomial  time.  From  the  preceding  remarks,  we  have; 

Theorem  4  AverP  C  ApproxP. 

Containment  in  the  opposite  direction  is  apparently  an  open  question,  though  my  guess  is  that 
AverP  is  properly  contained  in  ApproxP.  As  suggestive  evidence  (but  not  proof),  I  would  cite 
various  problems  which  are  approximately  solvable,  but  for  which  the  existence  of  an  algorithm 
running  in  time  polynomial  on  average  seems  uncertain.  These  are  described  in  the  following 
subsections. 

ApproxP  and  AverP  algorithms  for  finding  cliques 

To  start  with,  consider  a  variant  of  the  clique  problem.  The  general  clique  problem  on  random 
graphs  (i.e.,  graphs  in  which  each  edge  is  independently  present  with  probability  1/2)  is  known  to  be 
solvable  by  an  algorithm  with  expected  running  time  n°lIo*n)  [9],  Solving  Clique  in  polynomial  time 
on  average  is  open,  although  some  progress  was  made  by  Phan  Dinh,  Le  Cong,  and  Le  Tuan  [16]  in 
this  regard  by  restricting  the  edge  probabilities  or  the  total  number  of  edges  in  the  graph.  Below, 
I  have  obtained  positive  results  by  instead  restricting  the  size  of  the  clique  being  sought. 

Let  Clique(fc(n))  be  the  problem  of  deciding  if  an  n-node  graph  has  a  &(n)-clique.  Let  po  be  a 
standard  uniform  distribution  on  graphs,  i.e.,  the  number  of  vertices  n  is  chosen  with  probability 
proportional  to  n~2,  and  each  edge  is  present  with  probability  1/2.  Then  for  certain  choices  of 
k(n),  (Clique(fc(n)),po)  is  in  ApproxP,  as  proved  below.  Note  that  the  “standard”  proof  that  the 
Clique  problem  is  NP-compIete  as  described  by  Hopcroft  and  Ullman  [12]  asks  only  whether  the 
given  n-node  graph  contains  an  (R/3)-clique,  and  thus  shows  that  Clique(R/3)  is  NP-complete. 

Theorem  5  Assume  k(n)  =  u>(logn)  is  polynomially  computable.  Then  (Clique(fc(Ti)),p0)  € 
ApproxP. 


Proof:  The  algorithm  A  that  approximately  solves  this  problem  is  very  simple:  On  input  t  and 
an  n-node  graph  (7,  A  compares  c  with 


where  k  =  k(n).  If  e  is  larger  than  this  number,  then  A  just  says  “no”  (the  Nancy  Reagan 
heuristic),  since  this  number  bounds  the  probability  that  a  random  n-node  graph  contains  a  k- 
clique.  Otherwise,  if  e  is  very  small,  A  does  a  brute-force  search  of  all  (£)  subsets  of  k  vertices 
to  determine  if  G  has  a  fc-clique.  Since  e  is  so  small  in  this  case,  and  since  k(n)  =  u(logn),  the 
running  time  is  only  polynomial  in  n  and  l/e.  ■ 

It  seems  unclear  in  general  how  to  find  a  certificate  that  the  graph  does  not  contain  a  k(n)- 
clique  in  the  first  case  above  that  e  is  large.  On  the  other  hand,  when  k(n)  =  cn  for  some  constant 
0  <  c  <  1, 1  was  able  to  devise  a  polynomial  time  on  average  algorithm: 


Theorem  6  Let  0  <  c  <  1  be  fixed.  Then  (Clique(cn),^o)  €  AverP. 


Proof:  The  algorithm  A  for  solving  this  problem  in  polynomial  time  on  average  is  a  bit  more 
complicated  than  that  in  the  previous  theorem.  Given  a  graph  G  =  (V,  E)  on  n  vertices,  the 
algorithm  works  as  follows: 


1.  Let  d  =  1  +  [lg(l/c)l.  For  each  set  5  of  d  vertices,  compute  the  number  of  vertices  that 
are  common  neighbors  of  all  the  vertices  of  S.  (A  vertex  is  its  own  neighbor,  and  is  also  the 
neighbor  of  every  other  vertex  with  which  it  shares  an  edge.)  If  for  every  such  set  S  this 
number  is  less  than  cn,  answer  “no.” 


2.  Otherwise,  do  the  same  thing  for  every  set  of  t  =  3[lgn]  vertices.  Again,  if  the  number  of 
common  neighbors  of  every  set  of  l  vertices  is  less  than  cn,  answer  “no.” 

3.  Otherwise,  do  a  brute-force  search  to  determine  if  the  graph  contains  a  cn- clique. 

Note  that  in  cases  1  and  2,  a  certificate  is  obtained  that  the  given  graph  has  no  cn-clique 
(assuming  n  is  so  large  that  cn  exceeds  d  and  £)  since,  if  the  graph  did  contain  a  cn-clique,  then 
any  subset  of  the  nodes  forming  that  clique  would  have  at  least  cn  common  neighbors. 

Let  5  be  a  fixed  set  of  d  vertices.  Let  N  be  the  random  variable  describing  the  set  of  vertices 
in  V  -  5  that  are  adjacent  to  every  node  of  S  when  G  is  randomly  chosen.  Then  the  probability 
that  a  vertex  v  €  V  -  S  is  in  N  is  easily  computed  to  be  2~d.  Moreover,  this  event  is  independent 
of  other  vertices  appearing  or  not  appearing  in  N .  Therefore,  the  cardinality  of  N  is  distributed 
as  the  number  of  successes  in  n  —  d  trials  of  a  Bernoulli  variable  which  succeeds  on  each  trial  with 
probability  2~d.  Thus,  using  Chernoff  bounds  [1,  11],  it  can  be  shown  that  \N\  >  cn  -  d  with 
probability  at  most  2_e(nK  Note  that  this  also  bounds  the  probability  that  S  has  cn  common 
neighbors  in  V.  Therefore,  the  chance  that  any  set  of  d  nodes  has  cn  common  neighbors  is  at  most 
Q)  •  2_e(")  <  2~e(n*.  Note  also  that  step  1  takes  time  n°^ . 

The  analysis  at  step  2  is  similar,  although  Chernoff  bounds  are  unnecessary.  The  chance  that 
a  random  graph  contains  a  fixed  set  of  l  nodes  having  cn  -  l  common  neighbors  (again,  excluding 
themselves)  is  at  most 


2-<(C„-<)  <  n'  .  (n  _  I)™-'  . 


cn—l 


<  n 


3r-2cn 


Also,  this  step  takes  time  n0(lo8n*. 
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The  final  step  takes  time  ncn+°(1).  Combining  these  facts,  it  follows  that  the  expected  running 
time  for  a  random  n-node  graph  is  at  most 

nO(l)  +  2“e(n)  •  n°(lo«n)  +  nM- 2cn  .  ncn+£>(l)  <  „0(  1) 

and  therefore,  by  the  remarks  in  Section  2,  the  algorithm  runs  in  polynomial  time  on  average.  ■ 

An  ApproxP  algorithm  for  graph  coloring 

As  mentioned  in  the  introduction,  there  exist  simple-minded  algorithms  for  3-coloring  a  graph  in 
polynomial  time  on  average  [3,  18].  These  algorithms  are  easily  modified  for  c-coloring,  for  any 
constant  c.  It  is  apparently  open  whether  A;(w)-coloring  is  easy  on  average  when  k(n)  =  o>(l). 

However,  it  is  possible  to  construct  an  algorithm,  similar  to  the  one  in  Theorem  5,  that  approx¬ 
imately  fc(n)-colors  a  random  graph  when  k(n)  =  o(n/  log  n).  Below  CoIor(fc(n))  is  the  problem  of 
deciding  whether  a  graph  is  A:(n)-colorable. 

Theorem  7  Assume  k(n)  -  o(n/  log  n)  is  polynomially  computable.  Then  (Color(fc(n)),^o)  € 
ApproxP. 

Proof:  The  algorithm  A  that  approximately  solves  this  problem  is  very  similar  to  the  one  desribed 
in  the  proof  of  Theorem  5.  Given  c  >  0  and  an  n-node  graph,  A  compares  e  with  some  threshold 
value.  If  e  is  larger  than  this  value,  then  A  answers  “no;”  otherwise,  a  brute-force  search  ensues. 
An  appropriate  threshold  value  is 

0  =  n!  •  n*  •  2n/2~n2/2k 

where  k  =  k(n).  First,  if  c  is  less  than  0,  then  all  kn  possible  colorings  of  the  graph  can  be  tried 
in  time  polynomial  in  n  and  I/e  since  k(n )  =  o(n/  log  n).  If  e  is  more  than  0,  then  a  simple  “no” 
suffices  since  0  bounds  the  probability  that  a  random  n-node  graph  is  fc-colorable.  To  see  that  this 
is  so,  note  that  a  graph  G  is  ^-colorable  if  and  only  if  its  vertex  set  V  can  be  partitioned  into  k 
independent  subsets.  (A  set  is  independent  if  no  two  vertices  in  the  set  are  connected.)  Thus,  the 
probability  that  G  is  ^-colorable  is  at  most 

where  the  sum  is  over  all  partitions  of  V  into  k  nonempty  blocks  Ai,...  ,Ajt.  The  number  of  such 
partitions  is  loosely  bounded  by  n!  •  nk.  Furthermore, 

_  y-'  M.|2  <  n  _ 

k  v 2  /  2  k  2  ■ 2  2k 

by  a  convexity  argument.  It  follows  that  0  bounds  the  probability  of  a  random  graph  being  k- 
colorable.  ■ 

5  Comparing  complexity  classes 

Levin’s  paper  opened  the  way  to  the  study  of  a  whole  family  of  new  complexity  classes.  This  section 
explores  some  of  the  relationships  among  these  classes. 

We  have  already  discussed  DistNP,  AverP  and  ApproxP.  A  fourth  class  discussed  by  Gol- 
dreich  [5]  and  attributed  to  Ronnie  Roth  is  the  class  AverNP,  a  natural  liberalization  of  DistNP. 
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Specifically,  AverNP  consists  of  those  problems  ( D,/x )  for  which  /x  is  polynomially  computable  and 
some  nondeterministic  machine  M  solves  D  in  time  polynomial  on  average.  That  is,  there  exists  a 
function  i  :  E*  — *  N  such  that  is  computable  in  polynomial  time  on  average,  and  there  exists 
a  computation  of  M  that  accepts  x  in  £(x)  steps  if  and  only  if  D(x)  =  1. 

Goldreich  makes  the  interesting  observation  that  every  problem  in  AverNP  is  reducible  to  the 
Bounded  Halting  Problem  by  a  simple  modification  of  the  proof  of  completeness  for  DistNP.  Note 
that  this  does  not  imply  that  AverNP  C  DistNP  due  to  the  fact  that  the  reduction  used  may 
require  more  than  polynomial  time  in  the  worst  case.  On  the  other  hand,  it  is  easy  to  see  that 
AverNP  contains  both  AverP  and  DistNP. 

The  purpose  of  Section  3  was  to  find  a  problem  in  DistNP  that  is  not  in  AverP  unless  every 
other  language  in  DistNP  is  as  well,  i.e.,  unless  DistNP  C  AverP.  This  last  assumption  in  fact  can 
be  reduced  to  a  more  comfortable  assumption  from  the  theory  of  worst-case  complexity.  This  is 
proved  by  the  next  theorem  which  is  a  slight  generalization  of  one  proved  by  Ben-David  et  al.  [2]: 

Theorem  8  //DTime(2°^n^)  ^  NTime(2<?^"^)  then  DistNP  £  ApproxP. 

Proof:  Suppose  to  the  contrary  that  DistNP  C  ApproxP.  Let  D  be  a  decision  problem  in 
NTime(2°(n)).  Then  the  unary  problem  D'(lx)  =  D(x)  is  in  NP.  (Here,  string  x  is  associated 
with  a  natural  number  in  the  usual  way.)  Consider  the  distribution  /x'(lr)  =  z/x2,  where  z  is  some 
normalization  constant.  Then  /x  is  polynomially  computable,  and  so  ( )  6  DistNP  C  ApproxP. 
Thus,  there  exists  a  Turing  machine  M  for  which 

Pf/j  [ M(y,e )  ^  D'{y)\  <  e 

and  that  runs  in  time  polynomial  in  \y\  and  1/e.  Note  that  this  condition  implies  that  if  e  = 
n'(lx)  =  z/x2  then  M(lx,e)  =  D'(lx)  =  D(x).  Therefore,  on  input  x,  M(lr,z/x2)  computes  D{x) 
in  time  polynomial  in  x  =  0(2l*l).  ■ 

A  fundamental  question  concerning  these  average-case  complexity  classes  concerns  their  rela¬ 
tionship  to  other  worst-case  complexity  classes.  For  example,  if  (D,/x)  (E  AverP,  what  can  be  said 
about  the  complexity  of  D1  The  answer  is:  very  little.  As  an  extreme  example,  if  /x  concentrates 
all  its  probability  mass  on  a  single  point,  then  there  obviously  exists  a  very  fast  (constant  time  on 
average)  algorithm  for  (D,p),  despite  the  fact  that  there  exist  languages  that  require  an  arbitrarily 
great  amount  of  time  to  decide. 

A  more  reasonable  question  then  is  to  ask  about  the  complexity  of  D  restricted  to  the  support 
set  of  /x.  Specifically,  let  D |M  be  the  decision  problem  defined  by  Z?|A1(x)  =  D(x)  if  /x'(x)  >  0, 
and  D |M(x)  =  0  otherwise.  Further,  for  distributional  complexity  class  C,  let  C  denote  the  class  of 
decision  problems 

C  =  {D\»:(D,n)eC}. 

Now  we  can  re-ask  our  question:  How  does  AverP  fit  into  the  time  complexity  hierarchy? 

The  following  theorem  answers  this  question  more  generally: 

Theorem  9 


•  ApproxP  C  Exp,  and 

•  AverNP  C  NExp. 

Proof:  I  prove  the  first  part  only;  the  second  part  is  similar. 
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Let  ( D,fi )  6  ApproxP.  Then  there  exists  a  Turing  machine  M  that  solves  (D,n)  approximately 
so  that 

Pr^  [M(x,  c)  ^  Z)(x)]  <  t 

and  that  runs  in  time  polynomial  in  |x|  and  1/e.  Since  n  is  polynomially  computable,  we  can  easily 
decide  whether  /i(x)  =  0.  Moreover,  since  the  length  of  the  output  of  a  machine  computing  \x  is 
bounded  by  the  machine’s  running  time;  it  follows  that,  for  some  polynomial  p,  fi'(x)  >  2~p^1^  if 
p'(x)  >  0.  Since,  as  noted  in  the  preceding  theorem,  M(x,^!{x))  =  D{x),  it  follows  that  D\M{x) 
can  be  decided  in  exponential  time.  ■ 

Finally,  it  can  be  shown  that  this  last  theorem  is  the  best  that  can  be  proved: 

Theorem  10 

•  Exp  C  AverP,  and 

•  NExp  C  AverNP. 

Proof:  Again,  I  only  prove  the  first  part. 

Let  D  €  Exp.  Then  D  is  accepted  by  some  machine  M  in  time  2P for  some  polynomial  p.  Let 
p'(x)  oc  2-MIxI)+2IxD.  Then  p  is  polynomially  computable,  and  D\ ^  =  D.  Moreover,  M  accepts  D 
in  polynomial  time  on  average  (with  respect  to  p)  since,  by  an  easy  computation, 

£/(*)-2*WI<oc. 

_ _  ■ 

Together,  these  theorems  completely  characterize  AverP,  ApproxP  and  AverNP. 

Corollary  1 

•  Exp  =  AverP  =  ApproxP, 

•  NExp  =  AverNP,  and 

•  NP  =  DistNP. 

Further,  we  are  now  ready  to  fully  characterize  (almost)  the  containment  relationships  among 
these  average-case  complexity  classes.  This  is  summarized  in  the  containment  graph  in  Figure  1. 
An  edge  directed  from  A  to  B  indicates  that  class  A  is  contained  in  class  B.  A  dashed  edge  indicates 
that  the  containment  question  is  open.  This  graph  assumes  that  NP  ^  Exp,  and  DTime(2°(n))  ^ 
NTime(2°(")). 

Note  that  there  remain  two  unresolved  containment  questions.  Namely,  is  ApproxP  contained 
in  either  AverP  or  AverNP? 

6  Summary 

In  this  paper,  I  have  reviewed  much  of  what  is  known  about  average-case  complexity,  though  I 
certainly  have  not  covered  everything.  I  have  described  Levin’s  framework  for  studying  average- 
case  complexity,  and  have  discussed  some  of  the  few  known  complete  distributional  problems.  I 
have  also  suggested  a  more  relaxed  notion  of  “easy  on  average,”  which  captures  the  notion  of  a 
problem  that  can  be  solved  “approximately.”  Finally,  I  have  discussed  how  the  new  average-case 
complexity  classes  relate  to  one  another. 


Figure  1:  The  containment  graph  for  some  average-case  complexity  classes 


Acknowledgements 

This  is  a  revised  version  of  a  paper  prepared  as  part  of  my  “area  exam.”  Thanks  first  to  Silvio  Micali, 
Charles  Leiserson  and  Michael  Sipser  for  serving  on  my  exam  committee,  and  for  their  comments 
and  advice.  Thanks  also  to  Rafail  Ostrovsky  and  Joel  Wein  for  their  help  and  encouragement. 

References 

[1]  Dana  Angluin  and  Leslie  G.  Valiant.  Fast  probabilistic  algorithms  for  Hamiltonian  circuits 
and  matchings.  Journal  of  Computer  and  System  Sciences ,  18(2):155-193,  April  1979. 

[2]  Shai  Ben-David,  Benny  Chor,  Oded  Goldreich,  and  Michael  Luby.  On  the  theory  of  average 
case  complexity.  In  Proceedings  of  the  Twenty-First  Annual  ACM  Symposium  on  Theory  of 
Computing ,  pages  204-216,  May  1989. 

[3]  Edward  A.  Bender  and  Herbert  S.Wilf.  A  theoretical  analysis  of  backtracking  in  the  graph 
coloring  problem.  Journal  of  Algorithms,  6(2):275-282,  June  1985. 

[4]  M.  Garey  and  D.  Johnson.  Computers  and  Intractability:  A  Guide  to  the  Theory  of  NP- 
Completeness.  W.  H.  Freeman,  San  Francisco,  1979. 

[5]  Oded  Goldreich.  Towards  a  theory  of  average  case  complexity  (a  survey).  Technical  Report 
531,  Technion  Computer  Science  Department,  December  1988. 

[6]  Yuri  Gurevich.  Complete  and  incomplete  randomized  NP  problems.  In  Proceeding  of  the 
Twenty-Eighth  Annual  Symposium  on  Foundations  of  Computer  Science ,  pages  111-117,  Oc¬ 
tober  1987. 

[7]  Yuri  Gurevich.  The  challenger-solver  game:  Variations  on  the  theme  of  P=?NP.  Bulletin  of 
the  European  Association  for  Theoretical  Computer  Science ,  October  1989. 

[8]  Yuri  Gurevich.  Matrix  correspondence  problem  is  complete  for  the  average  case.  Unpublished 
manuscript,  November  1989. 


[9]  Yuri  Gurevich.  Average  case  completeness.  Journal  of  Computer  and  System  Sciences,  To 
appear. 

[lOj  Yuri  Gurevich  and  David  McCauley.  Average  case  complete  problems.  Unpublished 
manuscript,  April  1987. 

[11]  Wassily  Hoeffding.  Probability  inequalities  for  sums  of  bounded  random  variables.  Journal  of 
the  American  Statistical  Association ,  58(301):13-30,  March  1963. 

[12]  John  Hopcroft  and  Jeffrey  Ullman.  Introduction  to  Automata  Theory,  Languages,  and  Com¬ 
putation.  Addison- Wesley,  Reading,  MA,  1979. 

[13]  David  S.  Johnson.  The  NP-completeness  column:  An  ongoing  guide.  Journal  of  Algorithms, 
5(2):284-299,  June  1984. 

[14]  Leonid  A.  Levin.  Problems,  complete  in  “average”  instance.  In  Proceedings  of  the  Sixteenth 
Annual  ACM  Symposium  on  Theory  of  Computing,  page  465,  April  1984. 

[15]  Leonid  A.  Levin.  Average  case  complete  problems.  SIAM  Journal  of  Computing,  15(1):285- 
286,  February  1986. 

[16]  Phan  Dinh  Dieu,  Le  Cong  Thanh,  and  Le  Tuan  Hoa.  Average  polynomial  time  complexity  of 
some  NP-complete  problems.  Theoretical  Computer  Science,  46(2,  3):219-327,  1986. 

[17]  Ramarathnam  Venkatesan  and  Leonid  A.  Levin.  Random  instances  of  a  graph  coloring  problem 
are  hard.  In  Proceedings  of  the  Twentieth  Annual  ACM  Symposium  on  Theory  of  Computing, 
pages  217-222,  May  1988. 

[18]  Herbert  S.  Wilf.  Backtrack:  An  0(1)  expected  time  graph  coloring  algorithm.  Information 
Processing  Letters,  18(3):  1 19— 121 ,  March  1984. 


16 


OFFICIAL  DISTRIBUTION  LIST 


DIRECTOR 

Information  Processing  Techniques  Office 
Defense  Advanced  Research  Projects  Agency  (DARPA) 
1400  Wilson  Boulevard 
Arlington,  VA  22209 

OFFICE  OF  NAVAL  RESEARCH 
800  North  Quincy  Street 
Arlington,  VA  22217 
Attn:  Dr.  Gary  Koop,  Code  433 

DIRECTOR,  CODE  2627 
Naval  Research  Laboratory 
Washington,  DC  20375 

DEFENSE  TECHNICAL  INFORMATION  CENTER 
Cameron  Station 
Alexandria,  VA  22314 

NATIONAL  SCIENCE  FOUNDATION 
Office  of  Computing  Activities 
1800  G.  Street,  N.W. 

Washington,  DC  20550 
Atm:  Program  Director 

HEAD,  CODE  38 
Research  Department 
Naval  Weapons  Center 
China  Lake,  CA  93555 


2  copies 


2  copies 


6  copies 


12  copies 


2  copies 


1  copy 


